Bulletpapers - Understand complex papers in seconds

May 2024

Transforming Text into Images, Videos, 3D Objects and Audio via Flow-based Diffusion Transformers

This paper introduces Lumina-T2X, a family of flow-based large diffusion transformers designed to transform noise into images, videos, 3D objects and audio conditioned on text. Key techniques like tokenized representations, learnable placeholders, RoPE, RMSNorm and flow matching enable unified training and flexible generation across modalities and resolutions. Models sc...

May 2024

Expert-based motion planning for high speed navigation

This paper presents a new motion planning method for high speed robot navigation in cluttered environments. It uses a generative model called a normalizing flow to sample diverse, expert-styled motion trajectories. An accelerated collision checking method rejects poor samples before evaluation. In cluttered test environments, this planner shows comparable performance to...

May 2024

Exploring Markov jump processes for generative modeling

This paper studies Markov jump processes on discrete state spaces for generative modeling. It shows the time-reversed Ehrenfest process, which converges to a discrete state space analog of the Ornstein-Uhlenbeck process used in continuous generative models, bridges discrete and continuous state spaces. This allows transferring methods between settings. A new loss functi...

April 2024

Realistic Material Assignment for 3D Objects

This paper presents Make-it-Real, a novel approach that leverages large language models to identify materials from images and assign them to 3D objects. This allows creating realistic material properties for existing 3D assets and generated models.

April 2024

Measuring neural network reliance on interpretable features

This paper proposes a method to determine whether deep neural networks rely on specific, human-interpretable features when making predictions. It works by "removing" an interpretable feature from test data by adjusting images to have the same baseline value for that feature, while keeping the images realistic using a generative model. If the network's performance drops ...

April 2024

Discrete diffusion for electronic medical record generation

This paper introduces a new generative model, EHR-D3PM, to synthesize artificial but realistic electronic health records (EHRs). EHR-D3PM uses discrete diffusion, which is tailored for categorical data, to overcome issues with previous generative models and generate high-quality tabular medical codes. Experiments show EHR-D3PM significantly outperforms existing methods ...

April 2024

Generative modeling using physics and neural networks

This paper proposes methods to improve the accuracy and robustness of generative models that combine physics knowledge with neural networks. It uses normalizing flows to better model the structure of data, and attention mechanisms to make the model robust to noise.

April 2024

Modeling cooperative world dynamics

This paper proposes a method to model world dynamics conditioned on multiple agents' actions, to enable decentralized agents with only partial views to cooperate effectively. It uses generative models to estimate the full world state from partial views. Then it learns a compositional world model that can simulate outcomes of joint actions by multiple agents. Combined wi...

April 2024

Learning sparse hierarchical images

This paper introduces a new generative model for images called the Sparse Random Hierarchy Model (SRHM). The model generates images by composing visual concepts in a hierarchical way, similar to how complex images are built from simpler building blocks. Additionally, the model introduces sparsity, so that only a few 'informative features' are embedded within larger regi...

April 2024

Learning 3D Models from Unposed Images

This paper proposes a method to train 3D-aware generative adversarial networks (GANs) on images without known camera pose distributions. It introduces a template feature field to estimate poses on-the-fly during training. This allows learning complete 3D geometry from challenging unposed datasets.

April 2024

Nonlinear activation-free diffusion model for document restoration

This paper proposes a new generative framework called NAF-DPM to restore degraded documents. It uses an efficient nonlinear activation-free network within a diffusion probabilistic model to eliminate noise and sharpen text, while preserving key features. A fast sampling method allows it to generate high-quality restorations in just a few iterations. Additional technique...

April 2024

Category-level object pose learning without annotations

This paper proposes a method to learn a category-level 3D object pose estimator without requiring manually annotated pose data. Instead, it leverages generative diffusion models like Zero-1-to-3 to synthesize images of objects under controlled pose variations. To handle artifacts and noise, an image encoder learns pose features via contrastive learning. A novel strategy...

April 2024

Embedding Watermarks in Stable Diffusion Models

This paper proposes a plug-and-play framework to embed watermarks in Stable Diffusion models without retraining. The watermarks are embedded in latent space and adapt to the denoising process. Results show effective balance of image quality and watermark invisibility, robustness to attacks, and generalization across SD versions.

April 2024

Dynamic backtracking for efficient exploration in generative flow networks

This paper introduces a novel generative flow network (GFN) variant called dynamic backtracking GFN that allows the model to backtrack and correct past decisions during sampling. This enhances the adaptability and efficiency of exploration, helping avoid getting stuck in local optima. When applied to tasks like generating biochemical molecules and genetic sequences, the...

April 2024

Improving 3D Understanding from Multiple Views

The authors introduce a system called SAP3D that can reconstruct 3D models and generate novel views of objects from an arbitrary number of input images. As more images are provided, SAP3D adapts its internal generative model to better match the specific object instance, improving reconstruction and view synthesis quality. This bridges the gap between single-image method...

April 2024

Next-scale autoregressive image modeling

This paper introduces a new type of autoregressive image modeling called visual autoregressive (VAR) modeling, which generates images in a multi-scale, coarse-to-fine manner rather than the standard left-to-right, pixel-by-pixel approach. On an ImageNet benchmark, VAR allows autoregressive transformers to surpass diffusion models in image quality, diversity, efficiency ...

April 2024

Aligning images in 3D without poses

This paper introduces 3D Congealing to align 2D images capturing similar objects into a shared 3D space, without assuming poses or camera parameters. A framework is proposed that optimizes a canonical 3D representation to be consistent with both a pre-trained generative model and semantic information from input images. This enables applications like pose estimation and ...

April 2024

Universal representations of financial transactions

This paper presents a framework for learning universal representations of financial transaction data that are effective for diverse business problems. The authors propose novel generative models tailored to transaction data specifics and methods to incorporate external contextual information from other customers' activity. They also introduce a comprehensive benchmark t...

March 2024

Enhancing vision-language models

This paper introduces Mini-Gemini, a framework to enhance vision-language models like GPT-4 and Gemini. It improves performance and expands capabilities in image understanding, reasoning, and generation. Key aspects include efficient high-resolution visual tokens, high-quality training data, and integration with generative models.

March 2024

Synthetic lymph node generation for segmentation

This paper presents a pipeline integrating a generative model and a segmentation model. The generative model, LN-DDPM, synthesizes realistic abdominal lymph nodes using masks as conditions via global and local conditioning. This generates diverse paired data to train the segmentation model, nnU-Net, to improve lymph node segmentation.

March 2024

Generative AI for Medical Imaging Data Augmentation

This paper explores using generative AI models like GANs to synthesize medical images, addressing issues like small datasets and lack of diversity. Models are trained on real data then used to generate realistic synthetic images, increasing data quantity and variety. This enables more robust training of ML models for tasks like anomaly detection and diagnosis. Experimen...

March 2024

Detecting fake images using natural image statistics

This paper proposes detecting fake images based on comparing them to stable 'natural traces' learned from real images, instead of focusing on artifacts from specific generative models. They employ statistical properties consistently present in real images to train a model to distinguish real from fake. Evaluation shows high accuracy in detecting various state-of-the-art...

March 2024

Survey of Long Video Generation Techniques

This paper provides the first comprehensive survey of recent advancements in generating videos longer than 10 seconds or 100 frames. It summarizes these techniques into two key paradigms: divide-and-conquer, which separates keyframes from filling frames, and temporal autoregressive, which iteratively generates clips conditioned on prior frames. The survey analyzes commo...

March 2024

Unifying Diffusion Schrödinger Bridge and Score-based Models

This paper proposes a simplified theoretical formulation of the Diffusion Schrödinger Bridge (DSB) framework that enables integration with Score-based Generative Models (SGMs). By leveraging SGM as an initialization for DSB, faster convergence and improved performance are achieved. A reparameterization technique is also introduced that further enhances the network's cap...

March 2024

Zero-shot text-guided video translation

This paper introduces FRESCO, a new zero-shot framework that adapts image diffusion models to translate videos according to text prompts. FRESCO establishes robust spatial-temporal constraints by preserving both intra-frame spatial correspondence and inter-frame temporal correspondence of the input video. This coherence guidance, applied at both the attention and featur...

March 2024

Fast Personalized 3D Model Generation

This paper introduces Make-Your-3D, a novel method to quickly generate high-quality, customizable 3D models of a subject from a single image and text description in just 5 minutes. It works by optimizing and aligning a multi-view diffusion model and 2D generative model to match the distribution of the desired 3D subject.

March 2024

Connecting language and vision for text-to-image

This paper explores integrating different language models and generative vision models for text-to-image generation. It proposes LaVi-Bridge, a flexible framework that connects pre-trained language and vision modules using adapters and LoRA. This allows incorporating superior models to improve capabilities without retraining entire pipelines.

March 2024

Simplified analysis of push-forward constraints in machine learning

This paper provides theoretical analysis showing that push-forward constraints, which redistribute probability measures through functions, are generally nonconvex. It discusses the implications in machine learning, where such constraints play a key role in generative modeling and algorithmic fairness. The nonconvexity poses limitations on designing convex optimization p...

February 2024

Disentangling Technical and Biological Factors in Retinal Imaging

This paper introduces a generative model that disentangles technical image factors like camera type from biological factors like patient ethnicity and age in retinal fundus images. It enables controllable, realistic image generation and avoids shortcut learning of spurious correlations. A novel disentanglement loss based on distance correlation is proposed and shown to ...

February 2024

Fast mobile text-to-speech with discrete codecs

MobileSpeech is a fast, lightweight, and robust zero-shot text-to-speech system for mobile devices. It uses a parallel speech mask decoder with hierarchical discrete codec representations and cross-attention on text and speech prompts. This achieves state-of-the-art speed and quality on multilingual datasets.

February 2024

Score-based models: analyzing the noise schedule

This paper provides theoretical analysis and optimization of the noise schedule in score-based generative models. These models learn to generate new samples by adding noise to training data, then estimating score functions to reverse the noise process. The paper shows the noise schedule impacts model accuracy, and explicitly relates it to bounds on divergence between th...

February 2024

Point cloud density equalization

This paper introduces a method called the Lennard-Jones layer that rearranges points in 2D or 3D point clouds to normalize their density without distorting the overall shape. It works by simulating repulsive and attractive forces between points over time. The authors show this can redistribute random point sets into uniform distributions and embed it in neural networks ...

January 2024

Active learning for skeleton action generation

The paper proposes an active generative network that can generate diverse, realistic skeleton action data from very few or even just one sample action. It transfers the 'style' of a source action to the 'content' of a target action, preserving category and morphology. An uncertainty sampling technique chooses the most valuable generated samples to enhance quality.

January 2024

Deconstructing diffusion models for self-supervised image representation learning

This paper deconstructs modern Denoising Diffusion Models (DDMs), which are powerful generative models, to understand their ability for self-supervised representation learning. Through step-by-step simplification, the authors push DDM towards a classical Denoising Autoencoder (DAE), finding that only a few key components are critical. Ultimately they arrive at a simplif...

January 2024

Simple generative model for panoptic segmentation

This paper proposes a simple generative approach using latent diffusion models for panoptic segmentation. It trains a shallow autoencoder to compress segmentation masks into a latent space, then learns an image-conditioned diffusion process to generate masks. Key advantages are simplicity, generality, and enabling mask completion applications.

January 2024

User-guided part regeneration for 3D shapes

This paper introduces techniques to allow users to generate multiple, varied suggestions for individual parts of 3D shapes synthesized with neural networks. The authors perform experiments with different multimodal generative models to produce diverse part suggestions, using a comparative framework they develop for part-based shape synthesis. Evaluations show conditiona...

January 2024

Efficient knowledge transfer through noise-based distillation

This paper proposes Generative Denoise Distillation (GDD), an innovative knowledge distillation method where stochastic noises are added to the student model's features to produce a more compact representation. This technique is inspired by human learning processes and facilitates efficient alignment between student and teacher models. Experiments demonstrate state-of-t...

January 2024

Instant 3D Gaussian Generation

This paper introduces AGG, an amortized generative framework that produces 3D Gaussians from a single image input in one shot, without needing per-instance optimization. AGG utilizes a cascaded pipeline with a coarse hybrid generator to predict geometry and texture, and a super-resolution module to refine the Gaussians. Compared to optimization-based methods, AGG showca...

January 2024

Generating realistic single cell RNA data with diffusion models

This paper introduces scDiffusion, a novel generative model for producing high-quality simulated single cell RNA sequencing data. It is based on the diffusion model framework from computer vision, but adapted for gene expression data. scDiffusion can generate data that closely matches real data, and allows control over cell types and other conditions. This enables augme...

January 2024

Lifting 2D Portrait Features to 3D for Controllable Synthesis

This paper proposes a new method called 3D-SSGAN that enables controllable 3D-aware portrait image synthesis. It lifts 2D portrait features to 3D using a simple yet effective depth-guided module. This allows combining the disentangled modeling of 2D GANs with 3D-aware rendering for the first time. The results show the method can synthesize high quality, 3D consistent po...

January 2024

Sculpting 3D Humans from Synthetic Data

This paper presents a generative scheme called En3D that can produce high-quality, realistic, and diverse 3D human avatars without relying on scarce real-world 3D or 2D data. Instead, it trains on a large set of synthetic 2D human images rendered from 3D scenes with known camera parameters. Optimization modules refine the geometry and texture quality. The results signif...

December 2023

Continual learning through generative models

This paper introduces a new method called Adapt & Align for training neural networks to continually learn from new data distributions without forgetting previous knowledge. It works by splitting the training process into two phases - first, a local generative model (like a VAE or GAN) learns representations of just the new data; second, these representations are aligned...

December 2023

Efficiently Adapting Graph Neural Networks by Reconstructing Generative Patterns

This paper identifies the discrepancy in generative patterns between pre-training and downstream graphs as the fundamental cause of poor fine-tuning performance. It proposes G-Tuning, which tunes the pre-trained GNN to reconstruct the generative patterns (graphon) of downstream graphs. A theoretical analysis shows graphons can be approximated by a linear combination of ...

December 2023

Inverting diffusion models for interpretable text prompts

This paper focuses on inverting text-to-image diffusion models to directly obtain interpretable text prompts that represent target images. They utilize a delayed projection scheme to optimize for prompts within the model's vocabulary space. By leveraging that later diffusion timesteps cater to semantic information, prompt inversion in this range provides representative ...

December 2023

Fast 3D editing in latent space

This paper proposes Shap-Editor, a feed-forward neural network that can edit 3D assets in one second. It works by encoding assets into the latent space of Shap-E, where edits like adding hats or changing materials can be achieved through simple vector operations. Shap-Editor matches optimization-based approaches in quality while being orders of magnitude faster since it...

December 2023

Analyzing generative diffusion models using corrupted inputs

This paper introduces Diffusion-C, a methodology to test diffusion generative models like DDPM and DDIM using corrupted image inputs. Through experiments, the authors find DDPM performs best overall, with fog and fractal corruptions specifically challenging model robustness due to statistical similarities with the models' noise components.

December 2023

Photorealistic 3D generation from images

This paper proposes a new method to generate high-quality, diverse, and photorealistic 3D objects from a single input image and text description. It works by training a generative adversarial network to match the distribution of multi-view renderings to that of a pre-trained diffusion model. This avoids common issues like over-smoothing and saturation. The method enable...

December 2023

Transformer diffusion models for image and video generation

This paper explores the use of transformer architectures as an alternative to traditional CNN-based U-Nets in diffusion models for image and video generation. The introduced model, GenTron, adapts the Diffusion Transformer (DiT) architecture for text-conditional generation. Through scaling experiments and novel techniques like motion-free guidance for video, GenTron dem...

December 2023

3D Model Creation from Photos

This paper introduces HyperDreamer, a system that enables creating realistic and detailed 3D models from a single photo. It generates high-resolution 3D meshes that look compelling from any viewing angle. The materials and lighting are also modeled realistically. Users can easily select and edit regions of the generated 3D model via text prompts.

December 2023

Generating fluid configurations in fractures with diffusion models

Researchers developed a hybrid approach combining generative diffusion models and physics-based modeling to generate fluid configurations in fractures. This overcomes limitations of standard methods like high computational cost and non-unique solutions. Their approach trains a model to produce samples that serve as initial conditions for simulations, reducing time for c...

December 2023

Fixing Anatomical Issues in AI-Generated Hand Images

Researchers developed a pipeline to identify and correct anatomical inaccuracies in images of hands generated by Stable Diffusion. They constructed a specialized dataset of hand images to effectively train models for detecting irregularities. Key steps also include estimating hand positioning via body pose analysis, and refining image areas using ControlNet and Instruct...

December 2023

In-Context Image Generation

This paper proposes Context Diffusion, a new framework for image generation models to learn from visual examples provided alongside a query image and optional text prompt. It separates encoding of visual context from structure of the query image, enabling the model to leverage either visual or textual input. Experiments show it performs well on diverse in-domain and out...

November 2023

Framework for Generating Crystal Structures

This paper introduces a generative design framework called WyCryst to produce new inorganic crystal structures while ensuring they follow fundamental symmetry rules. It has three main components: a representation based on Wyckoff positions that encode symmetry, a variational autoencoder model that learns crystal distributions, and an automated DFT workflow to refine str...

November 2023

Fast 3D object generation from text

This paper proposes a novel framework called Instant3D that can generate a 3D object from a text description in under one second. It uses a feedforward network to directly construct a 3D triplane representation from the text prompt. The key innovation is exploring strategies to effectively inject the text condition into the network, including cross-attention, style inje...

November 2023

Diffusion model efficiently predicts physics simulations

This paper proposes a diffusion-based generative model to predict solutions of physics simulations, like partial differential equations. It gradually adds noise to real data, then learns to reverse this process to generate new samples. This approach leverages multi-fidelity data and is more accurate than prior methods.

November 2023

Accelerating image generation with latent consistency

This paper introduces latent consistency models (LCMs), which accelerate high-quality image generation from text prompts. LCMs predict solutions directly in latent space, bypassing iterative sampling. By distilling an autoencoder-based diffusion model into an LCM, images can be generated in just 1-4 steps while maintaining quality. Further, LCM-LoRA allows plug-and-play...

November 2023

Generating ship hull forms using GANs

This paper proposes using a conditional Wasserstein GAN to generate ship hull forms based on performance parameters like drag coefficient and tonnage, rather than geometric parameters. The GAN model is trained on a dataset of hull forms generated from the generalized Wigley mathematical hull form. Once trained, the GAN generator can output new hull forms by specifying d...

November 2023

Quantum modeling of sequences with trainable embeddings

This paper explores using 'Born machines', a quantum-inspired generative model, for modeling sequential data like RNA sequences. It introduces trainable token embeddings as quantum measurement operators, instead of fixed one-hot encodings. This allows packing more tokens in a smaller Hilbert space and adjusting the model's physical dimension as a hyperparameter. Results...

November 2023

Identifying key components for predicting molecular properties

This paper proposes a new generative model to identify the key components relevant for predicting molecular properties. By modeling the process generating molecular data, the model can disentangle semantic and non-semantic features. This allows better generalization to new molecular data distributions. The model identifies semantic atom variables and semantic substructu...

November 2023

3D generative model produces brain MR images and segmentations

This paper proposes a 3D generative model that can produce multi-modal brain MR images along with corresponding segmentations. The model allows conditioning on brain pathologies, generating paired synthetic images and labels. This enables robust training of segmentation models, even when real training data lacks certain phenotypes. The model was shown to improve white m...

November 2023

Learning from chaos

This perspective article connects classical nonlinear dynamics to modern machine learning, arguing that concepts from chaos theory and dynamical systems may inform the development of large-scale generative models. The authors revisit historical works on attractor reconstruction, symbolic dynamics, and complexity-entropy relations, finding parallels with contemporary met...

November 2023

Detecting out-of-distribution data using unreliable generated sources

This paper proposes a method to detect out-of-distribution data without access to real out-of-distribution samples. It uses a generated auxiliary detection task with unreliable out-of-distribution data to train the model, while ensuring the task benefits real out-of-distribution detection.

November 2023

Efficient image communication for AIoT using deep semantic segmentation and restoration

This paper proposes a novel deep image semantic communication model to enable efficient image transmission for Artificial Intelligent Internet of Things (AIoT) devices. At the transmitter, a high-precision semantic segmentation algorithm extracts key image semantics, significantly compressing data. At the receiver, a Generative Adversarial Network (GAN) restores the sem...

November 2023

Multi-Generator Model for Pedestrian Trajectory Prediction

This paper proposes a multi-generator model to predict pedestrian trajectories in autonomous driving. It captures complex social interactions and disconnected manifolds in trajectory distributions using a fused spatiotemporal graph and flexible generator selection. The model achieves state-of-the-art performance in reducing unrealistic out-of-distribution samples.

November 2023

Diffusion models for generating driving scenarios

This paper proposes Scenario Diffusion, a novel generative model for creating driving scenarios to test autonomous vehicles. It uses a diffusion model to generate realistic distributions of agent poses, orientations and trajectories conditioned on a map image and optional tokens describing aspects of the desired scenario. This provides controllable scenario generation.

November 2023

Learning robot skills via generative simulation

This paper presents RoboGen, a system that uses generative models to automatically propose diverse robot learning tasks, generate corresponding 3D simulation environments, decompose tasks into subgoals, design reward functions, and acquire skills through reinforcement learning and other algorithms. RoboGen aims to unleash infinite data for automated robot learning at sc...

November 2023

Learning to generate realistic time series data

This paper proposes a new technique for generating synthetic time series data that captures both the stepwise conditional dynamics and joint distribution of full trajectories. It trains a forward-looking transition policy to imitate sequential behavior using an energy model for reinforcement. This avoids compounding errors of autoregressive models and instability of adv...

November 2023

Synthesizing building facade views from a single image

This paper introduces FacadeNet, a deep learning method that can synthesize realistic images of building facades from different viewpoints, given only a single input image of the facade. FacadeNet uses a conditional GAN equipped with a novel selective editing module to modify view-dependent elements like windows while preserving structure. Experiments demonstrate state-...

November 2023

Diffusion models improve reinforcement learning

This paper surveys recent work applying diffusion models in reinforcement learning. Diffusion models are a class of generative models that can capture complex, multi-modal distributions. The paper summarizes how diffusion models have been used as planners, policies, data synthesizers, and in other novel ways to improve offline RL, handle sparse rewards, increase sample ...

November 2023

Improving generative model samples with optimal rejection sampling

This paper proposes a rejection sampling method called Optimal Budgeted Rejection Sampling (OBRS) that improves sample quality from generative models like GANs. OBRS provably minimizes divergence between the true data distribution and the refined sample distribution for a given sampling budget. The paper also proposes a training procedure incorporating OBRS which furthe...

November 2023

Regularizing GAN Training for Stability

This paper proposes a new method to stabilize generative adversarial network (GAN) training by preventing the discriminator from overfitting. It applies 'flooding', which flips gradients when losses become too low, directly to the adversarial loss. Through theory and experiments, they show appropriate flood levels for common losses, and that flooding helps stabilize var...

October 2023

Artificial intelligence exceeds human capabilities

This paper argues that artificial intelligence now surpasses human abilities in many domains. Large language models can read, summarize, and generate content across disciplines. Generative AI creates novel images and art. These tools enable interdisciplinary digital humanities projects at a new scale, though raise ethical concerns.

October 2023

3D Foot Shape Reconstruction from Images

This paper introduces a method to reconstruct accurate 3D models of human feet from just a few RGB images. It uses a large synthetic dataset to train neural networks to predict surface normals and keypoints from images. These are then used to fit a generative foot model. This approach works well even with very few input views and outperforms traditional dense reconstruc...

October 2023

Convergence of flow-based generative models

This paper provides theoretical guarantees for flow-based generative models trained progressively via the Jordan-Kinderleherer-Otto (JKO) scheme. It shows the model can generate data distributions up to total variation error O(ε) in O(log(1/ε)) JKO steps, under assumptions on the learning error. The analysis first proves exponential convergence of the JKO proximal gradi...

October 2023

Statistical physics of generative diffusion models

This paper shows generative diffusion models can be analyzed with tools from statistical physics. The diffusion time acts like temperature, and models undergo phase transitions and symmetry breaking. This provides insight into their generative capabilities.

October 2023

Synthetic data improves evaluation of ML models on subgroups and shifts

This paper proposes a framework called 3S Testing that uses deep generative models to create synthetic test data. This allows for more reliable evaluation of machine learning models, especially on underrepresented subgroups where real test data is limited. It also permits testing model performance under simulated distribution shifts. Experiments show 3S Testing provides...

October 2023

Generative multimodal key information extraction from documents

This paper proposes GenKIE, a novel generative model for extracting key information from scanned documents. It uses a multimodal encoder-decoder transformer architecture. The encoder embeds visual, layout, and textual features from documents. The decoder generates key entities by following prompt templates. A key advantage is robustness to OCR errors versus classificati...

October 2023

Video understanding via vision-language models

This paper proposes a framework combining discriminative vision-language models with generative video-to-text and text-to-text models, to improve zero-shot video understanding. The framework enhances visual features using video-to-text descriptions, and refines text classifiers with video-specific language prompts.

October 2023

Supervised training of generative models for density estimation

This paper proposes a new method to train generative models in a supervised manner for density estimation tasks. It utilizes score-based diffusion models to generate labeled data, then trains a neural network as the generative model using mean squared error loss. This avoids issues like unstable training and vanishing gradients faced by unsupervised methods. The key is ...

October 2023

Targeted attacks disrupt text-to-image models

This paper introduces prompt-specific poisoning attacks against text-to-image generative models. The authors show these models are vulnerable due to the sparsity of training data available for each concept. They propose an optimized attack called Nightshade that can successfully disrupt a model's ability to generate correct images for a targeted concept, using very few ...

October 2023

Language models for generating synthetic tabular data

This paper proposes Tabula, a new approach for generating synthetic tabular data using language models. Tabula is designed to accelerate training and improve synthesis quality compared to prior language model methods. Key innovations include using a randomly initialized model rather than a pre-trained one, iteratively refining the model on successive tasks, compressing ...

October 2023

Improving 3D Texture Synthesis with Pixel Gradient Clipping

This paper proposes a method called Pixel-wise Gradient Clipping (PGC) to enhance the quality of high-resolution 3D texture synthesis. PGC regulates the magnitude of pixel-wise gradients when generating 3D models using 2D image diffusion models, while preserving crucial texture details. It builds on traditional gradient clipping techniques but adapts them to work at the...

October 2023

Approach-Constrained Generative Grasp Sampling

This paper proposes CAPGrasp, a novel 6-degree-of-freedom continuous approach-constrained generative grasp sampler. CAPGrasp can generate grasps from any approach direction, overcoming limitations of prior discrete samplers. The authors also introduce an efficient training method that eliminates the need for massive labeled datasets, and a constrained refinement techniq...

October 2023

Flexible image editing with text

This paper proposes a method to edit images according to text descriptions, while keeping unrelated image content unchanged. It identifies a 'DeltaSpace' where visual and textual feature differences are aligned. This allows mapping between text features and latent generative model directions without needing text supervision during training. Their method trains a 'DeltaM...

October 2023

Language models for semantic search

This paper proposes a new method to learn semantic identifiers for documents using a self-supervised framework with a generative language model. The model is trained to generate semantic IDs that capture document semantics in a hierarchical structure. A semantic indexer encodes documents into discrete sequential IDs, and a reconstructor rebuilds documents from IDs via m...

October 2023

Generative forecasting on temporal knowledge graphs

This paper proposes a new framework called GenTKG for generative forecasting on temporal knowledge graphs. It combines a temporal logical rule-based retrieval strategy with lightweight parameter-efficient instruction tuning of large language models. This allows the models to generate accurate predictions for missing facts on a temporal knowledge graph based on historica...

October 2023

Text-to-video generation

This paper introduces ConditionVideo, a training-free approach to generating realistic videos from text descriptions and optional conditions like pose, without requiring large datasets. It leverages pre-trained image diffusion models like Stable Diffusion. ConditionVideo disentangles motion into condition-guided and background components, using separate noise vectors. I...

October 2023

Diffusion Models for Visual Media Synthesis

This paper provides a comprehensive overview of diffusion models, an emerging technique in AI for generating and manipulating visual media like images, video, and 3D graphics. Diffusion models have become the leading approach for creating photorealistic content from text prompts. The paper introduces the mathematical foundations, then surveys the rapid progress applying...

October 2023

Generating DNA Sequences with Diffusion Models

This paper proposes DiscDiff, a novel latent diffusion model tailored for generating synthetic DNA sequences. By embedding discrete DNA data into a continuous space using an autoencoder, DiscDiff leverages diffusion models' powerful generative capabilities for discrete data. The model demonstrates an ability to produce DNA sequences closely mirroring real DNA in motif d...

October 2023

Privacy-preserving data generation with model inversion

This paper proposes a new method called DPGOMI for generating synthetic data that preserves the privacy of sensitive real data. It first maps private data to the latent space of a public generator through an improved model inversion process. Then it trains a lower-dimensional differentially private GAN to model the latent space. DPGOMI outperforms prior methods in image...

October 2023

Score-based adversarial image generation

This paper introduces ScoreAG, a framework to generate adversarial examples for images that go beyond typical lp-norm constraints. It leverages recent advances in score-based generative models and diffusion guidance to synthesize new adversarial images, transform existing images into adversarial ones, and purify images to enhance classifier robustness. The key benefit i...

October 2023

Transporting probability densities with data-driven couplings

This paper introduces a framework to couple base and target probability densities in generative models built on transporting one density into another. It allows incorporating problem-specific structure like class labels or embeddings into the coupling. Experiments show performance gains on image super-resolution and inpainting.

October 2023

Street view synthesis from 3D data

This paper introduces MagicDrive, a framework to generate realistic street view images from 3D data like road maps, object boxes, and camera poses. It uses diffusion models for high image quality and proposes methods to encode the 3D data for precise control over the generated views. A key contribution is using cross-attention on object boxes and cross-view attention be...

October 2023

Interleaved image and text generation with generative visual tokens

This paper introduces a new technique for generating images and text together in a coherent, interleaved manner. It uses 'generative visual tokens' to connect large language models with image generation models. A two-stage training approach focuses first on aligning tokens with images, then fine-tuning on multimodal data. This allows the model to produce high-quality, c...

October 2023

Improving language model answer consistency

This paper proposes methods to improve the consistency between a language model's text generation and validation capabilities. The authors identify 'generator-validator inconsistency' as an issue where models like ChatGPT can correctly solve a math problem but incorrectly validate the answer. They develop techniques to measure and enhance consistency across text generat...

September 2023

Learning to sample high-quality candidates from partial ordering

This paper proposes Order-Preserving Generative Flow Networks (OP-GFNs) to sample candidates with probabilities proportional to a learned reward function consistent with a provided partial order, without needing an explicit scalar reward. OP-GFNs balance exploration and exploitation by gradually sparsifying the reward landscape during training. Experiments show OP-GFNs ...

September 2023

Causality-guided data-free neural network compression

This paper proposes a novel method called Causal-DFQ that uses causal reasoning to enable data-free neural network quantization, eliminating the need for real training data. It constructs a causal graph to model data generation and discrepancy reduction between pre-trained and quantized models. A content-style decoupled generator synthesizes images conditioned on releva...

September 2023

Improving reasoning in large language models

This paper shows that Contrastive Decoding, a simple method that maximizes the difference in likelihood between a strong and weak language model, improves reasoning and text generation from large language models. It achieves state-of-the-art performance on benchmarks including mathematical word problems and commonsense reasoning.

September 2023

Efficient robot motion planning with learned generative models

This paper introduces CppFlow, a new algorithm for efficient robot motion planning that combines learned generative models and classical optimization. CppFlow uses a learned generative inverse kinematics (IK) model to quickly produce candidate solutions on the GPU. These approximate solutions are refined into optimal, precise trajectories using optimization techniques l...

September 2023

Cartoon image generation with diffusion models

This paper introduces CartoonDiff, a new approach to generate cartoonized images using diffusion models like DDPM. It works by decomposing the diffusion sampling process into semantic and detail generation phases. By normalizing noise predictions during the detail phase, it removes fine textures while preserving major lines and contours. This achieves cartoon stylizatio...

The history of generative models